performance: cache parsed markers, constraints and versions #556

radoering · 2023-02-19T06:26:55Z

Some more fallout from #530. The parsing of marker strings became a bit slower with measurable effects in real-world examples. It seems in real-world examples, the same marker is used many times for different dependencies. Since markers are immutable, we can just cache the parsed markers and mitigate the performance regression.

Performance for shootout example with warm cache:

command	1.3.2	master	PR (only markers)	PR (constraints, versions)
lock	46 s	57 s	47 s	35 s
lock --no-update	14 s	18 s	14 s	10 s

(There seems to be no significant increase in peak memory usage due to the caching.)

Update: added times for caching parsed constraints and versions in addition to markers.

src/poetry/core/version/markers.py

dimbleby · 2023-02-19T11:23:59Z

looks fine to me though fwiw I don't reproduce any performance delta on the "big" install that I have lying around.

dimbleby · 2023-02-19T11:38:11Z

if this is real: I wonder whether constraint parsing and version parsing would similarly benefit

radoering · 2023-02-19T14:20:56Z

if this is real: I wonder whether constraint parsing and version parsing would similarly benefit

Great idea. Although there is more variance in constraints and versions than in markers, it makes a significant difference, too.

I updated the times in the PR description. Further, I tested the change on another largish (non-public) project:

command	without PR	only markers	markers,constraints,versions
poetry lock --no-update	150 s	135 s	115 s

Cache info for private project:

CacheInfo(hits=66985, misses=78, maxsize=None, currsize=78) poetry.core.version.markers.parse_marker
CacheInfo(hits=452083, misses=739, maxsize=None, currsize=739) poetry.core.constraints.version.parser.parse_constraint
CacheInfo(hits=73960, misses=196, maxsize=None, currsize=196) poetry.core.constraints.generic.parser.parse_constraint
CacheInfo(hits=19933, misses=581, maxsize=None, currsize=580) poetry.core.version.pep440.parser.PEP440Parser.parse

The peak memory usage even decreased when caching constraints and versions.